-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[GPU] Recognize parameters as valid inputs for compressed weights #32276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[GPU] Recognize parameters as valid inputs for compressed weights #32276
Conversation
522f237 to
6a6a649
Compare
|
build_jenkins |
6a6a649 to
16d03b4
Compare
|
build_jenkins |
b82b902 to
476f80f
Compare
mklimenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two branches of the if (pattern_map.count(weights_const_m)) { condition share a lot of similarities, please consider refactoring it to avoid code duplication
src/plugins/intel_gpu/src/plugin/transformations/convert_fc_to_compressed.cpp
Outdated
Show resolved
Hide resolved
This change enables use of quantized LoRA weights, passed as parameters during execution, to be recognized by the transformaions that produce FullyConnectedCompressed nodes for QGEMM execution.
The test previously expected the transformation to fail due to the use of input2 as a weight. The new logic allows use of parameters as weights, so the test has been adjusted to expect a successful transformation.
e9e889a to
1c45696
Compare
mklimenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much cleaner now, thanks!
|
@CuriousPanCake please review. |
|
@CuriousPanCake please review. |
Details:
Description of the issue:
At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters:


With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution:
Tickets: